Machine learning segmentation #62

leonardosepulveda · 2020-04-23T01:29:58Z

I changed the name of nuclei_segmentation_cv2 to "machine_learning_segmentation". I will keep the cv2 segmentation capability and add a new class to allow the implementation of machine learning approaches. I will implement ilastik, U-nets and cellpose to begin with. The user will need to provide a trained model that can be used to segment either nuclei or cells. I will add a test function to compare the results of all available methods in a few FOVs.

pep8speaks · 2020-04-23T01:44:58Z

Hello @leonardosepulveda! Thanks for updating this PR.

In the file merlin/analysis/segment.py:

Line 293:72: W291 trailing whitespace

In the file merlin/util/segmentation.py:

Line 139:9: E128 continuation line under-indented for visual indent
Line 184:22: E261 at least two spaces before inline comment
Line 184:22: E262 inline comment should start with '# '
Line 185:24: E261 at least two spaces before inline comment
Line 185:24: E262 inline comment should start with '# '
Line 186:23: E261 at least two spaces before inline comment
Line 186:23: E262 inline comment should start with '# '
Line 285:59: W291 trailing whitespace
Line 295:77: W291 trailing whitespace
Line 297:71: W291 trailing whitespace
Line 363:49: E221 multiple spaces before operator
Line 363:52: E222 multiple spaces after operator
Line 363:60: E222 multiple spaces after operator
Line 363:81: E501 line too long (93 > 80 characters)
Line 402:58: W291 trailing whitespace
Line 513:67: W291 trailing whitespace
Line 560:61: E251 unexpected spaces around keyword / parameter equals
Line 561:49: E127 continuation line over-indented for visual indent
Line 562:65: E251 unexpected spaces around keyword / parameter equals
Line 563:49: E127 continuation line over-indented for visual indent

In the file merlin/util/watershed.py:

Line 143:46: W292 no newline at end of file

In the file test/conftest.py:

Line 62:81: E501 line too long (83 > 80 characters)
Line 64:81: E501 line too long (91 > 80 characters)
Line 69:81: E501 line too long (86 > 80 characters)

In the file test/test_merfish_segmentation_cellpose.py:

Line 12:81: E501 line too long (83 > 80 characters)

Comment last updated at 2021-01-08 21:09:37 UTC

merlin-pep8speaks · 2020-04-23T01:44:58Z

Hello @leonardosepulveda! Thanks for updating this PR.

In the file merlin/analysis/segment.py:

Line 141:67: W291 trailing whitespace
Line 142:72: W291 trailing whitespace
Line 143:72: W291 trailing whitespace
Line 153:1: W293 blank line contains whitespace

codecov · 2020-04-23T01:49:32Z

Codecov Report

Merging #62 (0ac2e2e) into v0.1.7 (8e2cb01) will increase coverage by 0.16%.
The diff coverage is 83.38%.

@@            Coverage Diff             @@
##           v0.1.7      #62      +/-   ##
==========================================
+ Coverage   87.33%   87.49%   +0.16%     
==========================================
  Files          57       59       +2     
  Lines        5108     5343     +235     
==========================================
+ Hits         4461     4675     +214     
- Misses        647      668      +21

Impacted Files	Coverage Δ
merlin/merlin.py	`74.28% <ø> (ø)`
merlin/util/dataportal.py	`87.70% <ø> (ø)`
merlin/util/segmentation.py	`78.97% <78.97%> (ø)`
merlin/analysis/segment.py	`90.95% <89.87%> (+1.15%)`	⬆️
merlin/core/dataset.py	`90.88% <100.00%> (+0.02%)`	⬆️
test/conftest.py	`98.46% <100.00%> (+0.04%)`	⬆️
test/test_merfish_segmentation_cellpose.py	`100.00% <100.00%> (ø)`
test/test_merfish_segmentation_cv2.py	`100.00% <100.00%> (ø)`
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e2cb01...8d112f6. Read the comment docs.

leonardosepulveda · 2020-07-01T22:57:54Z

I would like to merge this branch to master. I did tests and the new machine learning segmentation using cellpose is working, as well as the watershed CV2 implementation.However, there are some incompatibilities with merlin/utils/dataportal.py. How should I solve that issue?

emanuega · 2020-07-02T00:44:22Z

What is the incompatible you're finding with dataportal.py? I prefer for all the raw data access to go through the dataportal so that it is easy to switch between AWS, GC, or local storage.

leonardosepulveda · 2020-07-02T14:52:33Z

I am trying to solve the issues that appear in workflow and ci. So far I understand the output from both so I can go back into the code and make modifications. However, now the build stops in the testing part, but it is unclear to me where the problem is:

#!/bin/bash -eo pipefail 
source ~/miniconda/bin/activate root 
conda activate base 
source activate merlin_env 
cd ~/MERlin 
mkdir ~/test-reports 
pytest --cov --cov-report=xml 
 
============================= test session starts ==============================
 
platform linux -- Python 3.6.10, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /home/circleci/MERlin
 plugins: cov-2.10.0
 collecting ... collecting 71 items                                                            collecting 96 items                                                            collected 96 items                                                             
 
test/test_barcode_database.py ..........                                 [ 10%]
 test/test_binary_utils.py ..                                             [ 12%]
test/test_codebook.py ...........                                        [ 23%]
test/test_core.py .....................                                  [ 45%]
test/test_dataorganization.py ..........                                 [ 56%]
 test/test_dataportal.py ............                                     [ 68%]
test/test_dataset.py ...                                                 [ 71%]
 test/test_image_reader.py .                                              [ 72%]
  
Too long with no output (exceeded 10m0s): context deadline exceeded

Any insights on how to debug? other place where I should look?

emanuega · 2020-07-02T16:59:50Z

It looks like you are a few commits behind the master and are incorporating some of the changes independently. Can you rebase this branch onto the head of the master branch so we can get all the commits incorporated? (see https://medium.com/@gabriellamedas/git-rebase-and-git-rebase-onto-a6a3f83f9cce). Then we can see if this problem persists.

leonardosepulveda · 2020-07-06T18:41:48Z

Thanks for the tip on rebase. It seems that the things started to work well, but the problem comes again with the changes I added to dataportal.py to make it look like the updated branch. See below what I get:

[lsepulvedaduran@boslogin03 MERlin]$ git rebase master machine_learning_segmentation
First, rewinding head to replay your work on top of it...
Applying: Adding _get_membrane_mask and _get_nuclei_mask functions to WatershedSegmentNucleiCV2 class
...
Applying: add george's modifications to dataportal
Using index info to reconstruct a base tree...
M       requirements.txt
M       test/test_dataportal.py
Falling back to patching base and 3-way merge...
Auto-merging requirements.txt
Applying: adding additional modifications to aws
Using index info to reconstruct a base tree...
M       merlin/util/dataportal.py
Falling back to patching base and 3-way merge...
Auto-merging merlin/util/dataportal.py
CONFLICT (content): Merge conflict in merlin/util/dataportal.py
Failed to merge in the changes.
Patch failed at 0066 adding additional modifications to aws
The copy of the patch that failed is found in:
   /n/home06/lsepulvedaduran/Software/MERlin/.git/rebase-apply/patch

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".

[lsepulvedaduran@boslogin03 MERlin]$

I looked at the patch 0066 , this is what says:

_[lsepulvedaduran@boslogin03 rebase-apply]$ more 0066
From 60acd7246a1a06562dd50a0147ccfb3d810dd35e Mon Sep 17 00:00:00 2001
From: leonardosepulveda <[email protected]>
Date: Sat, 15 Feb 2020 09:38:38 -0500
Subject: adding additional modifications to aws

---
 merlin/util/dataportal.py | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/merlin/util/dataportal.py b/merlin/util/dataportal.py
index e3d0f6e832b69a6491bee876abd741b44844ddb9..9fcd4631bcb603feeeddee777b1f1d9b3f8ffb0e 100644
--- a/merlin/util/dataportal.py
+++ b/merlin/util/dataportal.py
@@ -2,9 +2,11 @@ import os
 import boto3
 import botocore
 from google.cloud import storage
+from google.cloud import exceptions
 from urllib import parse
 from abc import abstractmethod, ABC
 from typing import List
+from time import sleep


 class DataPortal(ABC):
@@ -340,12 +342,41 @@ class GCloudFilePortal(FilePortal):
         return GCloudFilePortal(
             self._exchange_extension(newExtension), self._client)

+    def _error_tolerant_access(self, request, attempts = 5, wait = 60):
+        while attempts > 0:
+            try:
+                file = self._fileHandle.download_as_string()
+                return file
+            except (exceptions.GatewayTimeout, exceptions.ServiceUnavailable):
+                attempts -= 1
+                sleep(60)
+                if attempts == 0:
+                    raise
+
     def read_as_text(self):
-        return self._fileHandle.download_as_string().decode('utf-8')
+         backoffSeries= [1,2,4,8,16,32,64,128,256]
+        for sleepDuration in backoffSeries:
+            try:
+                file = self._fileHandle.download_as_string().decode('utf-8')
+                return file
+            except (exceptions.GatewayTimeout, exceptions.ServiceUnavailable):
+                if sleepDuration == backoffSeries[-1]:
+                    raise
+                else:
+                    sleep(sleepDuration)

     def read_file_bytes(self, startByte, endByte):
-        return self._fileHandle.download_as_string(
-            start=startByte, end=endByte-1)
+        backoffSeries= [1,2,4,8,16,32,64,128,256]
+            for sleepDuration in backoffSeries:
+            try:
+                file = self._fileHandle.download_as_string(start=startByte,
+                                                           end=endByte-1)
+                return file
+            except (exceptions.GatewayTimeout, exceptions.ServiceUnavailable):
+                if sleepDuration == backoffSeries[-1]:
+                    raise
+                else:
+                    sleep(sleepDuration)

     def close(self) -> None:
         pass
--
1.8.3.1_

Any idea on how to proceed?

HazenBabcock · 2020-07-06T20:17:01Z

If you open merlin/util/dataportal.py you should see some notes about merge conflicts, fix these and then you should be able to continue with the rebase.

leonardosepulveda · 2020-07-07T20:38:36Z

I think snakemake is hanging. I encountered this in my latest branch and updated the default snakemake parameters. Does 0f3a617 help?

I added the modifications to snakemakeParameters in [0f3a617], but the problem persists.

leonardosepulveda · 2020-07-07T22:55:00Z

I found the problem. It was in the test_merfish part. I added some tasks for the new segmentation, and the problem seems to come from there. I created independent tests for debugging.

leonardosepulveda · 2020-07-08T15:24:35Z

The build run to the end now but the new test files I created cannot be found, this is the complaints

FileNotFoundError: [Errno 2] No such file or directory: '/home/circleci/MERlin/test_analysis_parameters/test_analysis_segmentation_cellpose.json'
 
FileNotFoundError: [Errno 2] No such file or directory: '/home/circleci/MERlin/test_analysis_parameters/test_analysis_segmentation_cv2.json'

the folder test_analysis_parameters do not exist in my current branch, is there some data reorganization that I am missing?

emanuega · 2020-07-08T15:45:16Z

While running the tests it creates this directory and copies the files from the test/auxiliary files directory. You may have to update the base_files function in test/conftest.py to copy your test_analysis_segmentation json configurations into this folder.

… tests

leonardosepulveda · 2020-07-08T21:55:00Z

Thanks. Now the tests launch, but they are crashing for a reason I do not understand. I created the new tests by copying and modifying test_merfish.py and test_analysis_parameters.json. The current error is

______________________________ test_cv2_2d_local _______________________________

simple_merfish_data = <merlin.core.dataset.MERFISHDataSet object at 0x7f4636cba978>
     
    @pytest.mark.fullrun
    @pytest.mark.slowtest
    def test_cv2_2d_local(simple_merfish_data):
       	with open(os.sep.join([merlin.ANALYSIS_PARAMETERS_HOME,
                               'test_analysis_segmentation_cv2.json']), 'r') as f:
            snakefilePath = m.generate_analysis_tasks_and_snakefile(
                simple_merfish_data, f)
>           m.run_with_snakemake(simple_merfish_data, snakefilePath, 5)
 
test/test_segmentation_cv2.py:15: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  
merlin/merlin.py:194: in run_with_snakemake
    for t in dataSet.get_analysis_tasks()}
 
merlin/merlin.py:194: in <dictcomp>
    for t in dataSet.get_analysis_tasks()}
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  
 
self = <merlin.core.dataset.MERFISHDataSet object at 0x7f4636cba978>
analysisTaskName = 'RandomNumberParallelAnalysisTask'
    def load_analysis_task(self, analysisTaskName: str) \
            -> analysistask.AnalysisTask:
        loadName = os.sep.join([self.get_task_subdirectory(
            analysisTaskName), 'task.json'])
        print(loadName)
>       with open(loadName, 'r') as inFile:
E       FileNotFoundError: [Errno 2] No such file or directory: '/home/circleci/MERlin/test_analysis/merfish_test/RandomNumberParallelAnalysisTask/tasks/task.json'

merlin/core/dataset.py:620: FileNotFoundError

For what I understand that random number task is not called in the text_merfish code, so I do not know why it is complaining. I notice that the test that is run before the segmentation tests is test_plotting, which calls the randomNumber tasks. Any insights?

leonardosepulveda · 2020-07-09T18:43:02Z

I changed the name of the tests from test_segmentation... to test_merfish_segmentation... and the workflow works now. The only difference is that the tests run before test_plotting. Is this because the results of each analysis are in the same folder, and when trying the new analysis it reads previous analysis too? This seems to be a bug as I would not expect that the order of the execution should matter. Any ideas on how to solve it? maybe clean the folder after each test?

Now the codecov patch is the only part that is not passing. Most of the code I wrote is in segmentation.py., and has a coverage of 78%. I would need to create a 3D segmentation test to increase the coverage. Could you merge this branch to the main now or you want me to create 3D tests?

shavan2600 · 2021-01-29T00:32:25Z

Good

leonardosepulveda requested a review from emanuega July 1, 2020 22:45

leonardosepulveda self-assigned this Jul 1, 2020

emanuega changed the base branch from master to v0.1.7 July 2, 2020 00:42

leonardosepulveda added 9 commits July 6, 2020 14:16

moving utility functions from segment.py to watershed.py

4cf60db

removing utility functions from segment.py

dafa7b1

add comments to the header of multiple methods.

3e1bddd

remove commented methods

5352938

pep8 compliance

8333d39

pep8 compliance

6e0e6f9

pep8 compliance

bbcf37d

pep8 compliance

5bef0ce

add george's modifications to dataportal

0b584ff

leonardosepulveda added 9 commits July 6, 2020 16:55

adding additional modifications to aws

3a60dc8

fixing indentation

ba4935e

pep8 compliance

b9e01eb

added printing for debuging

2cd12e3

added printing for debuging

e2c466e

added printing for debuging

eef6d92

removing self calls

ce8f8c1

remove self calls

70b6591

pep8 compliance

f134a3e

leonardosepulveda added 2 commits July 7, 2020 13:35

add snakemakeParameters

9a59d35

increase latency_wait

673edaa

leonardosepulveda added 2 commits July 7, 2020 16:55

remove MLS and CV2 from merfish test

8c663b8

add test for new segmentation routines

323e880

leonardosepulveda added 7 commits July 8, 2020 12:06

add segmentation jsons to conftest

a1055c0

added unique task analysis_name for cellpose and cv2 tests

ef8ed16

hard-coded to variable array size in convert_grayscale_to_rgb

20c659a

added print statement for debogging

0bbf2c7

change test function names; added fiducial warp and globalign for seg…

2979196

… tests

add missing commas to json file

fff22c0

decrease snakemake latency wait

959a104

leonardosepulveda added 7 commits July 9, 2020 12:19

comment snakemake params

de2313d

fix comment indentation

06e9149

change json files for test segmentation

039eaf6

update conftest with new json names

1940353

corrected variable typo

1e90523

change test name

caf0518

simplify test json

b9f7088

leonardosepulveda added 5 commits July 9, 2020 14:46

change json name in test py file

5bbd4f6

simplify json to increase codecov

9f2e68a

100>1 in the cv2 segmentation

0ac2e2e

add flow_threshold and cellpob_threshold parameters

1b35306

adding flow_thresh and cellprob to machineLearningSegment class

8d112f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine learning segmentation #62

Machine learning segmentation #62

leonardosepulveda commented Apr 23, 2020

pep8speaks commented Apr 23, 2020 •

edited

Loading

merlin-pep8speaks commented Apr 23, 2020

codecov bot commented Apr 23, 2020 •

edited

Loading

leonardosepulveda commented Jul 1, 2020

emanuega commented Jul 2, 2020

leonardosepulveda commented Jul 2, 2020 •

edited

Loading

emanuega commented Jul 2, 2020

leonardosepulveda commented Jul 6, 2020 •

edited

Loading

HazenBabcock commented Jul 6, 2020

leonardosepulveda commented Jul 7, 2020

leonardosepulveda commented Jul 7, 2020

leonardosepulveda commented Jul 8, 2020

emanuega commented Jul 8, 2020

leonardosepulveda commented Jul 8, 2020

leonardosepulveda commented Jul 9, 2020 •

edited

Loading

shavan2600 commented Jan 29, 2021

Machine learning segmentation #62

Are you sure you want to change the base?

Machine learning segmentation #62

Conversation

leonardosepulveda commented Apr 23, 2020

pep8speaks commented Apr 23, 2020 • edited Loading

Comment last updated at 2021-01-08 21:09:37 UTC

merlin-pep8speaks commented Apr 23, 2020

codecov bot commented Apr 23, 2020 • edited Loading

Codecov Report

leonardosepulveda commented Jul 1, 2020

emanuega commented Jul 2, 2020

leonardosepulveda commented Jul 2, 2020 • edited Loading

emanuega commented Jul 2, 2020

leonardosepulveda commented Jul 6, 2020 • edited Loading

HazenBabcock commented Jul 6, 2020

leonardosepulveda commented Jul 7, 2020

leonardosepulveda commented Jul 7, 2020

leonardosepulveda commented Jul 8, 2020

emanuega commented Jul 8, 2020

leonardosepulveda commented Jul 8, 2020

leonardosepulveda commented Jul 9, 2020 • edited Loading

shavan2600 commented Jan 29, 2021

pep8speaks commented Apr 23, 2020 •

edited

Loading

codecov bot commented Apr 23, 2020 •

edited

Loading

leonardosepulveda commented Jul 2, 2020 •

edited

Loading

leonardosepulveda commented Jul 6, 2020 •

edited

Loading

leonardosepulveda commented Jul 9, 2020 •

edited

Loading